Application of two statistical approaches (Bayesian Kernel Machine Regression and Principal Component Regression) to assess breast cancer risk in association to exposure to mixtures of brominated flame retardants and per- and polyfluorinated alkylated substances in the E3N cohort

Background Brominated flame retardants (BFR) and per- and polyfluorinated alkylated substances (PFAS) are two groups of substances suspected to act as endocrine disruptors. Such substances could therefore be implicated in the occurrence of breast cancer, nevertheless, previous studies have led to inconstant results. Due to the large correlation between these substances, and the possibly non-linear effects they exert, evaluating their joint impact as mixtures on health remains challenging. This exploratory study aimed to generate hypotheses on the relationship between circulating levels of 7 BFR (6 polybrominated diphenyl ethers and 1 polybrominated biphenyls) and 11 PFAS and the risk of breast cancer in a case–control study nested in the E3N French prospective cohort by performing two methods: Principal Component Regression (PCR) models, and Bayesian Kernel Machine Regression (BKMR) models. Methods 194 post-menopausal breast cancer cases and 194 controls were included in the present study. Circulating levels of BFR and PFAS were measured by gas chromatography coupled to high-resolution mass spectrometry and liquid chromatography coupled to tandem mass spectrometry, respectively. The first statistical approach was based on Principal Component Analysis (PCA) followed by logistic regression models that included the identified principal components as main exposure variables. The second approach used BKMR models with hierarchical variable selection, this latter being suitable for highly correlated exposures. Both approaches were also run separately for Estrogen Receptor positive (ER +) and Estrogen Receptor negative (ER-) breast cancer cases. Results PCA identified four principal components accounting for 67% of the total variance. Component 3 showed a marginal association with ER + breast cancer risk. No clear association between BFR and PFAS mixtures and breast cancer was identified using BKMR models, and the credible intervals obtained were very wide. Finally, the BKMR models suggested a negative cumulative effect of BFR and PFAS on ER- breast cancer risk, and a positive cumulative effect on ER + breast cancer risk. Conclusion Although globally no clear association was identified, both approaches suggested a differential effect of BFR and PFAS mixtures on ER + and ER- breast cancer risk. However, the results for ER- breast cancer should be interpreted carefully due to the small number of ER- cases included in the study. Further studies evaluating mixtures of substances on larger study populations are needed. Supplementary Information The online version contains supplementary material available at 10.1186/s12940-022-00840-4.

Background Polybrominated diphenyl ethers (PBDE) and polybrominated biphenyls (PBB) are two families of brominated flame retardants (BFR) that have been greatly used in a wide range of consumers' products to reduce their flammability. Although their use has been progressively limited and banned in Europe during the 90 s, due to their resistance to degradation, PBDE and PBB are widespread in the environment [1]. The long-term toxic effects of PBDE and PBB in humans are not completely elucidated, but they are known to have endocrine disrupting properties and in 2019 PBDE have been included in the high-priority list of agents not previously evaluated by the International Agency for Research on Cancer (IARC) Monographs based on relevant bioassay and mechanistic studies [2,3].
Per-and polyfluorinated alkylated substances (PFAS) are a wide group of synthetic compounds that are water-and oil-repellent and have been used in a large number of industrial and consumer applications [4]. PFAS are characterized by long half-lives in the biota and humans and biomonitoring studies have suggested that PFOA and PFOS, the two main PFAS representatives, are ubiquitously present in the blood of humans worldwide [5]. PFAS are strongly suspected to act as endocrine disrupting chemicals (EDCs), and PFOA has been classified by the IARC as "possibly carcinogenic to humans" (Group 2B) [3,6].
The incidence of breast cancer has risen in the past decades among Western populations but, despite a large body of research, the etiology has not yet been fully delineated, as established risk factors cannot solely explain this trend [7][8][9]. There is growing concern that exposure to chemical environmental contaminants, particularly EDCs, could be one of the factors that led to an increased incidence of breast cancer in the Western world [10]. Actually, the impact of mixtures of environmental chemicals along the carcinogenic process can be triggered or boosted by individual chemicals through different mechanisms featuring the key characteristics of carcinogens [11,12]. For instance, some experimental studies have highlighted a stimulation of human breast cancer cell proliferation, especially for estrogen-sensitive cells, by some PBDE congeners or mixtures [13,14]. In addition, a study investigating their mechanisms of action in vitro revealed that some PBDE congeners could act as agonists or antagonists of estrogen receptors [15]. Concerning PFAS, studies on human breast cancer cells have also highlighted possible mechanisms of regulations of estrogen receptors by some congeners [16,17]. Indeed, the possible effect of these substances on breast cancer risk might be different according to the tumor estrogen receptor status. Previous epidemiological studies have attempted to elucidate the relationship between internal exposure levels of EDCs, such as PFAS and BFR, with breast cancer risk, leading to contradictory results [18][19][20][21][22][23][24][25][26][27][28][29]. In particular, concerning BFR, a previous study conducted in the E3N French prospective cohort using single-pollutant models has identified a non-linear negative association between circulating levels of PBDE-100 and PBDE-153 and breast cancer risk [23]. Concerning PFOS and PFOA, by applying single-pollutant models on data from the E3N cohort data, a positive linear association between circulating level of PFOS and estrogen receptor positive (ER +) breast cancer risks was highlighted, while a positive non-linear association was found between both PFOS and PFOA and estrogen receptor negative (ER-) breast cancer risk [18]. Notably, epidemiological studies have traditionally focused on estimating the health impact of exposure to individual substances not taking into account that in reality people are exposed to complex chemical mixtures. Therefore, estimating the health effects of several concurrent exposures is of increasing interest in epidemiology and has been identified as a research priority by several national and international scientific and advisory organizations [30,31].
There are several challenges to modeling the effects of exposure to chemical mixtures [32,33]. Firstly, mixtures components may interact with each other leading to synergistic or antagonistic effects. Secondly, a non-linear exposure-response relationship between exposure to mixtures and health effects may occur, especially in the context of exposure to EDCs. Finally, mixture components are often strongly correlated, leading to large uncertainty in the effect estimates.
Bayesian Kernel Machine Regression (BKMR) is a novel statistical method that addresses the above-mentioned challenges. Indeed BKMR estimates the exposureresponse surface accounting for interactions and nonlinear relationships by flexibly modeling exposures and it can handle multicollinearity between substances by applying hierarchical variable selection [34][35][36][37][38].
Another classic method used in epidemiology to deal with multicollinearity of multiple exposure variables is Principal Component Analysis (PCA). PCA projects the original data points living in the space of the original, and possibly correlated, variables onto a lower dimensional subspace whose orthogonal axes, referred to as principal components, are uncorrelated, in a way that minimizes the average projection error. Principal components can then be used to predict the outcome of interest by means of classic regression models. [39,40]. This combination of PCA and regression analysis is referred to as Principal Component Regression (PCR).
The objective of this exploratory study is to generate hypotheses on the relationship between circulating levels of 7 BFR (6 PDBE and 1 PBB) and 11 PFAS and the risk of breast cancer in a case-control study nested in the E3N French prospective cohort using two methods: PCR models, and BKMR models. More specifically, PCR models can highlight profiles of exposures associated with breast cancer, accounting for less informative variables and managing the high dimensionality with orthogonal scores [41]. By construction, these profiles are linear combinations of exposures. Thus by assessing the association of profiles with breast cancer status using logit regression we work under the assumption of linear exposure-response functions. The BKMR approach complements this analysis by identifying predefined groups of substances (variable selection) and by modeling exposure-response functions more flexibly, in particular without assuming their linearity and allowing for interactive effects [33].

The E3N cohort
The E3N study is a large ongoing French prospective cohort of women, set up in France in 1990. The study was approved by the French National Commission for Data Protection and Privacy (ClinicalTrials.gov identifier: NCT03285230); all participants gave written informed consent. The detailed protocol has been described elsewhere [42]. Briefly, 98 995 women born between 1925 and 1950, were included from the French national health insurance plan for people working for the national education system, the Mutuelle Générale de l'Education Nationale (MGEN). Women were enrolled in the cohort through a self-administered questionnaire, and were followed by self-administered questionnaires every two years. Detailed cancer risk factor data were collected through questionnaires at different time points during follow-up, including reproductive history, use of hormonal treatments, anthropometric characteristics, smoking habits, alcohol consumption, diet, and physical activity. The average follow-up rate per questionnaire cycle has been of 83%, and to date, the total loss to follow-up since 1990 has been < 3%.
Between 1994 and 1999, E3N participants were invited to donate blood, resulting in the collection of blood samples from approximately 25 000 participants. Each sample was separated into 28 aliquots (i.e. plasma, serum, buffycoat, leukocytes, and erythrocytes) that were stored in plastic straws in liquid nitrogen containers (− 196 °C) in a biobank.
Breast cancer cases were identified through selfreporting in the questionnaires, from the MGEN files, or through information from death certificates. Deaths were reported by family members and by searches in the MGEN files and causes of death were obtained from the National Death Index. Pathology reports were obtained for the 93% of incident cases. We also considered cases for which pathology reports have not been obtained, because the proportion of false-positive self-reports was low in our study population (< 5%). Cases were identified up to 2013, which was therefore used as the date of end of follow-up in statistical analyses.

The nested case-control study on breast cancer
As previously described elsewhere [18], we identified 281 breast cancer cases for which at least three aliquots of serum and six aliquots of plasma were available in the biobank. From these, we excluded all cases who had not completed the dietary questionnaire in 1993 (n = 27) or who were diagnosed before the blood sampling and/or before the dietary questionnaire (n = 11). Cases of Paget's disease and benign breast disease were also excluded (n = 3). Finally, 240 incident breast cancer cases were available. Due to budget constraints, among those, 194 incident postmenopausal breast cancer cases were randomly selected and included in the study. For each case, one control was sampled from women who were free of breast cancer at the time of diagnosis of the corresponding case (density sampling method). Controls were matched to cases by age (± 2 years), menopausal status at blood collection (premenopausal or postmenopausal), body mass index (BMI) at blood collection (< 25 or ≥ 25 kg/m2) and year of blood collection.

Measurement of biomarkers of exposure
Laboratory analysis of BFR and PFAS have been detailed elsewhere [18,23]. Briefly, measurements of PFAS in serum were based on a preliminary alkaline digestion followed by a two-stage Solid Phase Extraction purification using polymeric Oasis HLB and graphitized carbon (ENVI-Carb) cartridges, before liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) measurement. Analysis of BFR involved a liquid/liquid extraction with pentane followed by determinations with gas chromatography (Agilent 7890A) coupled to high-resolution mass spectrometry (GC-HRMS). The quantification was conducted using the isotopic dilution method with 13

Covariates
Logistic regression and BKMR models were adjusted for total plasma lipid content by the addition of a separate term in the model (ng/L, continuous), as recommended by Schisterman et al. [43]. The models were further adjusted for smoking status (never vs. ever), physical activity measured in metabolic equivalent tasks (MET)hour/week (continuous), education (≤ 12 years, 12 to 14 years, > 14 years), personal history of benign breast disease (no vs. yes), family history of breast cancer (none, in first-degree relatives, in extended relatives), parity and age at first full-term pregnancy (FFTP) (no children, 1 or 2 children and < 30 years old at FFTP, ≥ 3 children and < 30 years old at FFTP, ≥ 30 years old at FFTP), total breastfeeding duration (never, ≤ 6 or > 6 months), age at menarche (years, continuous), current use of menopausal hormone therapy (yes, no), use of oral contraceptives (ever vs. never), age at menopause (menopause before age 51, menopause at age 51 or later). For the variables measured in different questionnaires, we took the value declared in the last questionnaire completed before the date of diagnosis of the case; for controls, the date of diagnosis of the matched case was used. Since BKMR and non-conditional logistic regression models do not account for case-control matching, these models were additionally adjusted for the matching criteria: age at blood draw (years, continuous), BMI at blood draw (kg/ m 2 , continuous), and year of blood draw (continuous), except for menopausal status at blood draw, in order to avoid over-adjustment due to the a priori inclusion of age at menopause in the model.
The selection of confounders was done a priori, based on the known breast cancer risk factors available in the E3N dataset that are potentially associated with the exposures considered in the present study.
In our study population, missing values were < 5% for all variables and were imputed to the median (continuous variables) or modal value (categorical variables).

Statistical analyses
Substances for which more than 25% of the values were below the limit of detection (LOD) were eliminated from the analysis (namely PBFS, PFDS, PFBA, PFPA, PFHxA, PFDoA). For those substances which had 25% or less of the values below the LOD, these last were imputed to ½ of the LOD.
Demographic characteristics of the study participants were reported using means and standard deviations or counts with percentages. Univariate conditional logistic regression models were performed to compare concentrations of PFAS and BFR between cases and controls. Exposure to substances were log-transformed for the analyses. Correlations between log-transformed PFAS and BFR concentrations were assed using Pearson's correlation tests.

Approach 1 -Principal Component Regression (PCR)
A PCA with varimax rotation was performed on the matrix of log-transformed biomarkers, to identify a reduced number of uncorrelated components representing the exposure to substances. The number of retained components was chosen using several indicators: individual and cumulated explained variance, and interpretability and coherence of identified components [40]. Then, multiple logistic regression models were fitted, with the identified components scores as main exposure variables (continuous and categorized in quintiles groups based on the adherence distribution to the different components). In order to verify if the same patterns of exposure were observed among cases and controls, PCA was also performed separately among these two groups.
In order to investigate the hypothesis of a differential relationship between exposure to PFAS and BFR and breast cancer risk based on the tumor expression of estrogen receptors (ER-vs. ER +), the components scores previously identified were also included in two separate models: the first model included ER + cases and all controls, while the second model included ERcases and all controls. Cases for which information on the estrogen receptors' expression was missing were excluded from the analysis.
All these models were adjusted for the covariates described above, including matching criteria.
As a sensitivity analysis, conditional logistic regression models accounting for the matching of cases and controls were run on the overall study population, but not on specific ER + and ER-subpopulations. These models were not adjusted for the matching criteria.

Approach 2 -Bayesian Kernel Machine Regression (BKMR)
BKMR was proposed as a new approach to assess the effect of exposure to chemical mixtures on health [34]. An R package ('bkmr') exists for this purpose, with the possibility of adapting the model to binary outcomes, like breast cancer [35]. In the present study, the hierarchical variable selection mode was used. There are two possible levels of variable selection: a group selection, corresponding to BFR on the one hand, and PFAS on the other hand, and an individual substance selection within each group.
The model for binary outcome using the Probit link function with hierarchical variable selection is as follows: For each subject i = 1,…,n: where: Φ −1 = probit link function; Y i = binary outcome (0/1), here breast cancer; , …,z iM , = exposure variables of group 1, here 11 PFAS (ng / mL of serum) as continuous, log-transformed and centered variables (i.e. subtraction of the mean); v i1 ,…,v iN = exposure variables of group 2, here 7 BFR (ng / L of plasma) as continuous, log-transformed and centered variables (i.e. subtraction of the mean); h = flexible function of exposure variables, specified using a kernel function (exposure-response function); x i = vector of covariates (see the list below); β = vector of the corresponding coefficients; ε i = residuals.
First, the BKMR model with a hierarchical selection of variables provides two types of posterior inclusion probabilities (PIPs): the PIPs of each of the two groups (BFR and PFAS), and the conditional PIPs of each substance within groups. The PIPs are indicators of the contribution of each group or substance in relation to the outcome.
Secondly, the BKMR method estimates a univariate exposure-response function for each substance. This function consists of a section of the function h quantifying the relationship between a given substance and the outcome, while all other exposure variables are set at their median value. In the specific case of a binary outcome, these sections of h can be interpreted as the relationship between the exposure variable and an underlying continuous latent outcome, such as a biomarker of health status underlying the binary outcome [35].
Then, the BKMR method provides bivariable exposure-response functions, which are estimates of the relationship between exposure to a given substance and the outcome while one other substance is fixed at predefined percentiles (20 th , 50 th , and 80 th ) and all the others are fixed at their median value. This approach allows to highlight interactions between pairs of substances: an alteration in the dose-response curve of one substance at a different percentile of another substance suggests an interaction, while parallel lines indicate no interaction. With the hierarchical variable selection mode, only interactions between substances not belonging to the same group can be identified.
Finally, the cumulative effect of the overall exposure to the substances is provided by calculating the differences between the estimated value of h when all substances are fixed at a predefined percentile, compared to the estimated value of h when all substances are fixed at the 50 th percentile, used as reference.
In the present study the number of iterations of the Markov Chain Monte Carlo sampler was set at 100,000 with non-informative default priors defined by the package.
As for approach 1, the model was performed first in the overall study population, and then separately for Frenoy et al. Environmental Health (2022) 21:27 ER + cases and all controls, and for ER-cases and all controls, as described above.
As a sensitivity analysis, based on the results obtained from Pearson's correlation tests and PCA, BKMR was also performed with a hierarchical selection including three groups: PFAS, PBDE, and, separately, PBB. All other parameters were kept identical to the main model.
The threshold for statistical significance was set at 5% and all statistical tests were two-sided. Statistical analyses were performed using SAS (version 9.4, SAS Institute) and R (version 4.0.3). Table 1 shows summary statistics for each variable for the overall study population, as well as separately for cases and controls. Mean age at diagnosis was 68.5 years (range: 58.3-84.9 years). Among cases, the average follow-up time between blood sample collection and diagnosis was 12.2 years. Information on tumor estrogen receptor expression was available in 158 cases (81%) and among these 132 were ER + .

Descriptive analyses
Among BFR, PBDE-47 (4.70 ng/L) was the congener with the highest average concentration in the overall population, followed by PBDE-153 (3.14 ng/L) and PBB-153 (2.03 ng/L). PFOS (19.08 ng/mL) and PFOA (7.35 ng/mL) were the two PFAS with the highest average concentration in the study population. The mean and standard deviation of each exposure variable in the total study population, and for cases and controls separately, is presented in Table 2. Overall, the results of Pearson's rank correlation tests, reported in Supplementary material Fig. 1, highlighted how all PBDE were positively and strongly correlated, while no or weak correlations were observed between PBDE and PBB-153. Also PFAS were generally positively correlated, while between PFAS and PBDE, as well as between PFAS and PBB-153, no or weak correlations were observed.

Approach 1 -Principal Component Regression
PCA identified four main components accounting respectively for the 29%, 22%, 9%, and 8% of the total variance. Loading factors for each chemical on each component are presented in Supplementary material Results for logistic regression models are presented in Table 3. When including all breast cancer cases, a positive association between Component 3 in quintiles and breast cancer risk was observed, with Odds Ratios (OR) > 1 for all quintile groups when compared to the first quintile group (global p-value = 0.09).
Results were similar when including only ER + cases: Component 3 in quintiles was positively associated with ER + breast cancer risk, with OR > 1 for all quintile groups when compared to the first quintile group (global p-value = 0.04). Also, an association was highlighted for Component 4, with OR > 1 for the second and third quintile groups, and OR < 1 for the fourth and fifth quintile groups (global p-value = 0.05).
Finally, when including only ER-cases, an association was identified for Component 1 in quintiles, with OR > 1 for the second, third, and fourth quintile groups, and OR < 1 for the fifth quintile group (global p-value = 0.06). In addition, it has to be taken into account that all the confidence intervals were wide for this analysis due to the limited number of cases included (n = 26).

Sensitivity analysis
When performing conditional logistic regression models including the overall population, results were similar to those obtained with unconditional logistic regression models in the overall population (Supplementary material Table 3).
Regarding the relationship between each substance and breast cancer risk while setting all other exposure variables at their median value, concerning the BFR, a positive trend was observed with PBDE-28, and PBDE-99, while an inverse relationship was observed with PBDE-47 and PBDE-100. Concerning PFAS, a positive trend was observed for PFOS, PFOSA, and N_MeFOSAA, while a negative trend was observed for N_EtFOSAA, and PFUnA ( Fig. 1-A).
Regarding the interactions plots, no interactions were identified between substances across groups (Supplementary material Fig. 3).
Finally, the cumulative effect of PFAS and BFR, meaning the cumulative effect estimate comparing all substances at their median concentrations (reference) to the concentrations corresponding to each 5th percentile from the 25th to the 75th percentile, was close to zero ( Fig. 2-A).
When performing the analyses including only ER + cases, the PIPs of BFR and PFAS were respectively 0.48 and 0.66. The conditional PIPs of each of the substances within BFR were similar to those observed when  Fig. 2-B). The univariate exposure-response functions showed similar results as for the overall population, but with steeper slopes, especially for PBDE-47, PBDE-99 and PFOSA ( Fig. 1-B). No interactions were identified between substances across groups (data not shown). A slightly positive association was observed between PFAS and BFR cumulative effect and ER + breast cancer risk (Fig. 2-B). The PIPs of BFR and PFAS, when including only ERbreast cancer, were respectively 0.74 and 0.62. The highest conditional PIPs were observed for PBDE-100 (0.25) and PBDE-99 (0.18) concerning BFR, while with regards to PFAS the substance with the highest conditional PIP was N_MeFOSAA (0.23) (Supplementary material Fig. 2-C). The results obtained from the univariate exposureresponse functions, highlighted an inverse association between PBDE-99 as well as PBDE-100 and ER-breast cancer risk, and a positive association, between N_ MeFOSAA and ER-breast cancer risk. (Fig. 1-C). No interactions were identified between substances across groups (data not shown). Finally, the cumulative effect of PFAS and BFR was associated with a decrease of ERbreast cancer risk (Fig. 2-C).

Sensitivity analysis
When running BKMR including PBDE and PBB as separate groups, the PIPs were 0.38, 0.33, and 0.50 for PBDE, PBB, and PFAS, respectively. The conditional PIPs were similar to those obtained in the main analysis, expect for PBB-153 which, being the only substance included in the PBB group, had a conditional PIP equal to 1.00. The results of the univariate exposure-response functions were comparable to those obtained in the main analysis, although the associations were overall attenuated (data not shown). No interactions were identified between substances across groups (data not shown). The cumulative effect of PFAS, PBDE, and PBB on the risk of breast cancer was close to zero (Supplementary material Fig. 4).

Discussion
In this study, we explored the association between exposure to BFR and PFAS as a mixture and breast cancer risk, comparing two statistical approaches, PCR and BKMR. Globally, no clear association between mixtures of BFR and PFAS circulating levels with breast cancer risk was highlighted and no interaction between substances was identified. However, both approaches suggested a differential effect on ER + and ER-breast cancer risk, although the results for ER-breast cancer should be interpreted carefully due to the small number of ER-cases included in the study.  All models are adjusted on age at blood draw (continuous), body mass index at blood draw (kg/m 2 , continuous), year of blood draw (continuous), total plasma lipid content (ng/L, continuous), smoking status (never vs. ever), physical activity measured in metabolic equivalent tasks (MET)-hour/week (continuous), education (≤ 12 years, 12 to 14 years, > 14 years), personal history of benign breast disease (no vs. yes), and family history of breast cancer (none, in first-degree relatives, in extended relatives), parity and age at first fullterm pregnancy (FFTP) (no children, 1 or 2 children and < 30 years old at FFTP, Regarding results obtained with PCR, an association between Component 3 (mainly composed of PFOSA, N_ MeFOSAA and N_EtFOSAA) and ER + breast cancer risk was identified. A non-linear association could be noticed for Component 4 (mainly composed of PBDE 153, PBB 153, PFDA and PFUnA). As expected, results for overall breast cancer risk were similar to those obtained for ER + since these represented the majority (81%) of breast cancer cases included in the study.
In the ER-stratum, a non-linear association could be noticed for Component 1 (mainly composed of PBDEs). The small number of ER-cases could explain the lack of statistically significant effect.
When performing BKMR analyses for overall and ER + breast cancer risk, results were also very similar, but globally attenuated for overall breast cancer risk. The PFAS group had a slightly higher PIP than the BFR group, suggesting that PFAS could play a more important role than BFR in the occurrence of breast cancer. Moreover, conditional PIPs highlighted that PBDE-153 (positive relationship) and PBB-153 (negative relationship), among the BFR group, and PFOSA (positive relationship), among the PFAS group, played the most important roles in the relationship with overall and ER + breast cancer risk. A positive cumulative effect of BFR and PFAS on ER + breast cancer risk was also highlighted. No effect was seen for overall breast cancer risk, probably due to a dilution effect due to the inclusion of ER-cases.
When performing analyses including only ER-breast cancer cases, PIPs were instead slightly higher for the BFR group than the PFAS group. In addition, the substances that had the highest PIPs were PBDE-100 and PBDE-99, in the BFR group, and N_MeFOSAA, in the PFAS group. Regarding the relationship between each substance and ER-breast cancer risk, several differences were noticed in comparison to ER + breast cancer risk. Indeed, PBDE-99 and ER-breast cancer risk showed a negative relationship; stiffer slopes were identified for PBDE-100 (negative relationship) and for N_MeFOSAA (positive relationship) compared to ER + breast cancer risk, while the slopes for PFOSA and PFOS were milder. Finally, the results suggest a negative cumulative effect of BFR and PFAS on ER-breast cancer risk.
When comparing the results obtained with the two approaches tested in the present study, some consistency can be noticed. Indeed, with regard to ER + breast cancer risk, an association was shown with Component 3, and a non-linear association for Component 4. This was overall coherent with the results of the BKMR approach, which identified among the most important substances contributing to ER + breast cancer risk, PBDE-153 and PBB-153, with a positive and a negative relationship, respectively. These two substances were in fact contributing strongly to Component 4, and this could explain the non-linear association identified for this Component. Concerning the PFAS group, the BKMR approach identified PFOSA among the substances with higher PIP, highlighting a positive relationship with ER + breast cancer risk. PFOSA, in turn, was also an important contributor to Component 3 and could be responsible for the association identified between this component and ER + breast cancer risk.
With regard to ER-breast cancer risk, only a non-linear association was highlighted for Component 1. Among the substances with the higher PIPs identified by the BKMR approach, some were contributing to Component 1, such as PBDE-99 and PBDE-100, having both a negative relationship with ER-breast cancer risk. However, numerous other substances contributed to Component 1, with opposite relationships with ER-breast cancer risks and this could explain the non-linear association found with this Component.
PCA, a dimension reduction method classically used in epidemiology to deal with multicollinearity, transforms a large set of variables into a smaller one, minimizing the loss of information. It does so by creating new uncorrelated variables, called components, on the basis of correlations between the initial variables [40]. In our study, PCA has identified components that were linear combination of initial exposures, and logistic regression models further allowed to identify some associations between these components and breast cancer risk. Thereby this approach does not allow concluding whether all the substances contributing to the component, or rather only some of them, are responsible for the relationship with the outcome. Given that components are linear combinations of substances, this approach does not allow either to identify interactions between substances.
The BKMR approach allows to model non-linear and non-additive relationships between substances and outcome, accounting for confounding variables [34]. For hierarchical variable selection, suitable for studying multiple correlated substances, groups of substances are built a priori, based on the correlations between substances but also on the known potential biological mechanisms [24,35]. In this study, we only tested two and three groups, separating the main families of substances known to have similar chemical structure and industrial use. Finer groupings of substances according to different biological mechanisms could be considered in future studies, thus testing other hypotheses. In our study, results obtained with BKMR models allowed to generate hypothesis concerning which substances play the most important role in the relationship with breast cancer risk by means of the PIPs and the shape of the slopes. While this method allows identifying interactions between substances across groups, no interaction has been identified in the present study, possibly due to the lack of statistical power.
More precisely, regarding the hypothesis of a positive association between PFOSA and both ER + and all breast cancer risk generated by our study, one previous casecontrol study has also identified a positive relationship between PFOSA and all breast cancer risk [21], while another one has identified no association for all breast cancer risk or ER + breast cancer risk [20]. Concerning the hypothesis of a positive association between PBDE-153 and both ER + and all breast cancer risk generated by our study, one previous case-control study has also identified a positive relationship between PBDE-153 and all breast cancer risk but not ER + breast cancer risk [27], while others studies have identified no associations [24,26]. Previous analyses in the same E3N nested case-control population using a single-pollutant approach have identified a negative association for both all breast cancer risk and ER + breast cancer risk [23]. Finally, regarding the hypothesis of a negative association between PBB-153 and both ER + and all breast cancer risk, previous analyses in the same E3N nested case-control population have identified no association for all breast cancer risk or ER + breast cancer risk [23].
However, comparisons between the results of these studies, using a single pollutant approach, and our results, obtained from a mixture approach, must be made with caution. Indeed, the high correlation levels between substances could potentially lead to biased estimates of the associations between individual substances and breast cancer and this could explain the inconstant results between the different studies. Moreover, although the use of sums of substance, instead of individual substances, could have overcome the problems due to collinearity, this approach is based on the strong assumption that the aggregated substances have an additive effect, which is not necessarily true. The limits of these methods and the lack of studies evaluating mixtures of substances rather than individual or sums of substances could explain the global inconsistency of results. More studies assessing the impact of the choice between multi-pollutants versus single-pollutant approaches on results are needed to better understand these differences.

Strengths and weaknesses
This study presents some limitations. First, the study population was composed of women working for the French national education system, with globally a healthier lifestyle than the general population, so the extrapolation of results to the general population should be done with caution. In addition, the small study population size limited the statistical power. In particular, the small number of ER-cases limited the consistency of the results when performing stratified analysis based on hormone receptor status. The presence of chemicals with observations below the limit of detection should also be taken into account. We acknowledge that large left-censored data (i.e. > 25%) may have impacted regression based estimates, especially when using parametric methods constrained by distributional assumptions, which is not the case of BKMR [44]. To overcome this issue, we excluded substances for which more than the 25% of the values were left-censored; for such substances we opted for a substitution method (by LOD/2) considering the computational convenience and the little impact of the method chosen for managing low percentages of non-detects (< 25%) in principal component analysis [45]. Furthermore, the long average time between measurements and diagnosis in cases could have limited the ability to identify significant effects. Moreover, as mentioned earlier, when applying BKMR models, chemicals were grouped based on similar chemical structure and industrial use, thus limiting the hypothesis tested. Additionally, the single-measurement of exposures did not allow taking into account the trajectories of exposures to these substances over time. Finally, the current BKMR method available for binary outcomes is implemented with a probit link function instead of logit, which is more commonly used in case-control studies. However, we believe that the findings from our exploratory approach are not affected by the underlying link function because of their mathematical similarity whose differences would mainly affect testing or interpretation of the results [46,47]. In addition, we tried to take into account the matched design by adding the matching variable in the models.
The present study has also several strengths. First, the availability of numerous information collected in the E3N cohort has permitted to adjust models on main potential confounders for breast cancer. In addition, the present study investigated two methodological approaches, PCR and BKMR, that are adapted to the assessment of the effects of chemical mixtures rather than the singleexposures, and that take into account multicollinearity. These features allowed to highlight some positive trends between some PBDEs and PFAS considered as mixtures and breast cancer risk, which could be elusive in singlepollutant models. In addition, the combination of these two methods allowed generating more robust conclusions than those that could have been obtained applying a single method. In addition, these two approaches can be seen as complementary, PCR detecting associations with some components of correlated substances, and BKMR further allowing to identify interactions between substances and non-linear associations while accommodating confounding.

Conclusions
Combining the results of these two approaches has made it possible to formulate hypotheses about the components associated with breast cancer risk and to further hypothesize which substances contributing to these components could be responsible for the association identified, highlighting the direction of the relationship and taking multicollinearity into account. However, the insufficient number of subjects has not allowed identifying some potentially significant association with the BKMR approach, which resulted in very large credible intervals. Further studies evaluating mixtures of substances on larger sample sizes are needed.
Additional file 1: Table S1. Percentage of values below the Limit Of Detection (LOD) and associated decision for each substance measured in the present study. Table S2. Loading factors for each substances on each of the 4 components obtained by principal component analysis. Table S3. Associations between adherence to PCA components and all breast cancer risk. Components are used in continuous and in quintiles in conditionnal logistic regression models. Odds Ratio (OR) and 95% Confidence Intervals (CI) are presented. Figure S1. Correlation matrix between log-transformed exposure to substances in the study population. Pearson's rank correlation coefficients are presented. Figure S2-A. Conditional posterior inclusion probabilities of substances within group for all breast cancer risk. Figure S2-B. Conditional posterior inclusion probabilities of substances within group for ER+ breast cancer risk. Figure  S2-C. Conditional posterior inclusion probabilities of substances within group for ER-breast cancer risk. Figure S3. Exposure-response functions between exposure to Substance 1 and probit of probability of having a breast cancer while Substance 2 is fixed at defined percentiles (20th, 50th, and 80th), all other substances being fixed at their median value. Figure  S4. Cumulative effect of PFAS, PBB and PBDE for all breast cancer risk.